A ν-support vector regression based approach for predicting imputation quality

نویسندگان

  • Yi-Hung Huang
  • John P Rice
  • Scott F Saccone
  • José Luis Ambite
  • Yigal Arens
  • Jay A Tischfield
  • Chun-Nan Hsu
چکیده

BACKGROUND Decades of genome-wide association studies (GWAS) have accumulated large volumes of genomic data that can potentially be reused to increase statistical power of new studies, but different genotyping platforms with different marker sets have been used as biotechnology has evolved, preventing pooling and comparability of old and new data. For example, to pool together data collected by 550K chips with newer data collected by 900K chips, we will need to impute missing loci. Many imputation algorithms have been developed, but the posteriori probabilities estimated by those algorithms are not a reliable measure the quality of the imputation. Recently, many studies have used an imputation quality score (IQS) to measure the quality of imputation. The IQS requires to know true alleles to estimate. Only when the population and the imputation loci are identical can we reuse the estimated IQS when the true alleles are unknown. METHODS Here, we present a regression model to estimate IQS that learns from imputation of loci with known alleles. We designed a small set of features, such as minor allele frequencies, distance to the nearest known cross-over hotspot, etc., for the prediction of IQS. We evaluated our regression models by estimating IQS of imputations by BEAGLE for a set of GWAS data from the NCBI GEO database collected from samples from different ethnic populations. RESULTS We construct a ν-SVR based approach as our regression model. Our evaluation shows that this regression model can accomplish mean square errors of less than 0.02 and a correlation coefficient close to 0.75 in different imputation scenarios. We also show how the regression results can help remove false positives in association studies. CONCLUSION Reliable estimation of IQS will facilitate integration and reuse of existing genomic data for meta-analysis and secondary analysis. Experiments show that it is possible to use a small number of features to regress the IQS by learning from different training examples of imputation and IQS pairs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting tensile strength of rocks from physical properties based on support vector regression optimized by cultural algorithm

The tensile strength (TS) of rocks is an important parameter in the design of a variety of engineering structures such as the surface and underground mines, dam foundations, types of tunnels and excavations, and oil wells. In addition, the physical properties of a rock are intrinsic characteristics, which influence its mechanical behavior at a fundamental level. In this paper, a new approach co...

متن کامل

Development of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug

Introduction: Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its ...

متن کامل

Development of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug

Introduction: Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its ...

متن کامل

Toward a Thorough Approach to Predicting Klinkenberg Permeability in a Tight Gas Reservoir: A Comparative Study

Klinkenberg permeability is an important parameter in tight gas reservoirs. There are conventional methods for determining it, but these methods depend on core permeability. Cores are few in number, but well logs are usually accessible for all wells and provide continuous information. In this regard, regression methods have been used to achieve reliable relations between log readings and Klinke...

متن کامل

Support vector regression for prediction of gas reservoirs permeability

Reservoir permeability is a critical parameter for characterization of the hydrocarbon reservoirs. In fact, determination of permeability is a crucial task in reserve estimation, production and development. Traditional methods for permeability prediction are well log and core data analysis which are very expensive and time-consuming. Well log data is an alternative approach for prediction of pe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2012